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1 Introduction 


Getting a construction project done on time is a major performance goal that many DOTs including 
Montana DOT (MDT) constantly pursue and monitor. However, most DOTs continue to struggle 
to meet the schedule performance target of their highway projects. For example, in 2017, 
approximately $144.5 millions of road projects in Montana experienced schedule delay (Fraser, 
2017). Estimating and tracking construction project duration is crucial in the project development 
process since it not only directly affects the agency's key performance indicator, but also affects 
the contractor selection, construction costs, construction quality, safety, and public satisfaction. 
Both unreasonably short or long construction contract times can result in negative consequences 
such as high bid prices, lack of qualified bidders, poor work quality, claims and disputes, prolonged 
inconvenience to the traveling public, lack of innovations, increased administration costs, and 
safety issues (FHWA, 2002; Hildreth, 2005; H. S. Jeong et al., 2009). 


MDT is in the process of modernizing their contract time determination processes by developing 
user-friendly tools to facilitate the estimation of project duration and ultimately the determination 
of contract time. MDT has successfully developed the Production Rate Estimation Tool (PRET) 
for controlling activities and visual construction sequence logic diagrams for common types of 
highway projects (Jeong et al., 2019; Jeong & Alikhani, 2020). These tools are bottom-up tools 
that can help support specific work tasks during the scheduling and contract time development 
processes. As a continuation of this modernization effort, there is a need to develop a top-down 
tool that can estimate a project's duration when a limited amount of project information is available 
during the preconstruction stages. This tool can be used throughout the preconstruction phases to 
quickly determine a reasonable project duration for proper project planning and delivery, and it 
can also be used as a reality check tool along with the bottom-up tools during the procurement 
stage. 


In recent years, artificial intelligence (AI) technologies have improved their technical capabilities 
for pattern recognition and prediction. Promising AI techniques such as Artificial Neural Networks 
(ANNs) are capable of processing various types of data and learning complex patterns to make a 
prediction with reliable accuracy. An AI-based data-driven model can leverage historical project 
characteristics and performance data to estimate a reliable project duration for a new project. In 
this research, an AI-based model and its tool were developed using historical highway project data. 
The model identifies the most influential factors that affect project duration such as estimated 
construction cost, major controlling work items and their quantities. It uses those factors as input 
variables to estimate the project duration with a certain level of confidence. Additionally, a 
regression model with the same set of input variables was developed as a companion to the AI 
model. A robust Microsoft Excel implementation tool was also developed to support a quick and 
reliable estimation of a project's duration. 


1.1 Project Objectives 


The goal of this research project was to develop a) an Al-based estimation model that takes key 
highway project characteristics and estimates a reliable project duration, and b) a Microsoft Excel- 
based tool that can operationalize the AI model. The specific objectives of this research project 
include i) obtain and analyze historical project data, 11) identify the most influential factors that 
affect the duration of highway projects, iii) develop an AI-based project duration estimation model 
and validate the results, and iv) develop an MS Excel-based tool that provides a user-friendly 
interface for using the AI model. 


1.2 Work Tasks 


To achieve the goals of this project, the following work tasks were performed. 


1.2.1 Task 1: Critical Review of Current Leading Practices 


The research team reviewed current top-down project duration estimation methods developed by 
some DOTs such as the Kentucky Transportation Cabinet, Indiana DOT, Ohio DOT, and Colorado 
DOT. The strengths and weaknesses of those methods in terms of estimation accuracy, technical 
approach, data requirements, and user interface were analyzed and documented. The research team 
also reviewed the state-of-the art AI techniques such as artificial neural networks to identify the 
most effective algorithm for project duration estimation. The analysis and review results of the 
current top-down project duration estimation methods used by some DOTs were utilized in 
evaluating the feasibility and suitability of different types of AI techniques for this research. 


1.2.2 Task 2: Data Collection, Preliminary Analysis, and Meeting with MDT Schedulers 


In this task, the research team obtained the historical project data from MDT. The obtained data 
were cleaned, processed, normalized, and organized to be suitable for this research. The research 
team first applied explanatory statistical methods to understand the collected data, determine data 
characteristics and define any visible patterns of data attributes such as correlation. Statistically 
significant variables were identified in this process and these variables became candidate variables 
for developing an AI-driven project duration estimation model. 


The research team conducted a virtual meeting with MDT schedulers and representative district 
engineers to discuss the preliminary findings of the research and obtain their feedback. The 
meeting was used to confirm the influential factors identified for project duration estimation. The 
research team demonstrated a preliminary AI model to obtain their feedback and identify areas of 
improvement of the model to assure its practicality. 


1.2.8 Task 3: AI Model Development 


In this task, the findings and results from Tasks 1 and 2 were utilized to develop a fully functioning 
AI model for project duration estimation. The model receives input variables and passes them to 
hidden layers, where the input variables are processed together with a non-linear function to predict 
the output value. To ensure the validity of the models, approximately 80% of the dataset was used 
for training the model and the remaining dataset was used to validate the reliability and accuracy 
of the model. A multivariate regression model was also developed with the same set of input 
variables as a companion of the AI model. 


1.2.4 Task 4: Tool Development 


The research team developed a user-friendly tool based on the AI and regression models developed 
in Task 3 that operates in Microsoft Excel environment for easy implementation. The tool is named 
AI-PDET (Artificial Intelligence based Project Duration Estimation Tool). The research team used 
Visual Basic for Applications (VBA) in Microsoft Excel to automate computational tasks and 
create a user-friendly interface. The research team developed a user's manual with real examples 
to demonstrate how to use the tool to estimate the duration of a project. 


1.2.5 Task 5: Guidance on Tool Maintenance and Database Update 


A detailed guide on tool maintenance and database updates was developed in this task as the tool 
needs to stay relevant to new projects. It explains how to obtain and clean the new project data and 
describes different coding parts and actions that need to be taken to transfer the attributes of the 
two models to AI-PDET. 


1.2.6 Task 6: Final report, Project Webinar, Final Presentation and Implementation 
Meeting 

In this task, a draft of the final report that encompasses all task results, findings, and products was 
prepared for the technical panel's review. All comments from the technical panel on the draft were 
incorporated into the final report. Other required deliverables such as the project summary report 
and the performance measures report were submitted with the approval of the final report. A 
project webinar along with a final presentation was provided to the technical panel for rapid 
dissemination of the research findings. Also, an implementation meeting was conducted with the 
project technical panel to review the research team's implementation recommendations and to 
determine implementation recommendations. The research team documented the discussion 
results in the form of an implementation report. 


2 Literature Review 


The FHWA requires State Transportation Agencies (STAs) to have adequate written procedures 
for the determination of contract time. Contract time is defined as the maximum time allowed in 
the contract for completion of all work contained in the contract documents (FHWA, 2002). 
Current practices of Contract Time Determination (CTD) identified by a recent survey indicated 
68% of DOTs across the US participating in the survey had a formal procedure for CTD (Taylor 
et al., 2017). Fifty three percent of the DOTs have developed agency-specific production rates of 
controlling work items and 39% of the DOTs use a project-specific sequence logic to estimate 
contract time. 


Two main approaches are mostly used for project duration estimation to estimate the contract time: 
the bottom-up and the top-down approach. The bottom-up approach develops a pre-construction 
schedule to compute the total project duration that includes a) estimating the durations of work 
items using the production rates and b) determining activity relationships using activity sequence 
logics (FHWA, 2002; Daradkah et al., 2018; Jeong & Alikhani, 2020). DOTs have developed 
specific tools such as spreadsheet-based production rate estimation tools, production rate 
adjustment tools for weather and site factors, activity sequencing logic diagrams and contract time 
determination templates to help the scheduler develop a bar chart or a CPM-based schedule to 
determine the project duration. 


For example, Virginia DOT categorizes their highway projects into six types and uses production 
rates and sequence logics for estimating project duration (Gondy & Hildreth, 2007). The Kentucky 
Transportation Cabinet and Texas DOT have developed a series of tools to support production rate 
estimation and construction activity sequencing using the critical path method concept (Connor, 
2004). MDT developed an Excel-based tool to estimate production rates of major controlling 
activities considering factors affecting production rates such as work item quantity, work hours, 
work types, different seasons of work, districts, area types (urban/rural), and budget types (Jeong 
et al., 2019). The tool is used to determine a reliable duration of each controlling activity in project 
schedule. Also, MDT developed activity sequence logic templates that include major controlling 
activities for each common project type to help schedulers determine the sequence of activities in 
project schedules (Jeong & Alikhani, 2020). Although the bottom-up technique provides a reliable 
estimation of project duration before construction, it requires that the project is well-defined and 
the project information such as work items and their quantities is known with high certainty. 
However, such detailed information is not usually available in the early pre-construction stages. 


A top-down approach for project duration estimation can be used in the pre-construction phases 
when the project's scope is not well defined. However, even at this early stage of project 
development, a DOT needs a reasonable project duration estimation for project planning and 
budgeting purposes. Therefore, a top-down project duration estimation is desirable during the pre- 
construction stages when a limited amount of project information is available, and the project 


design is not finalized. Also, this top-down tool will be handy to check the reasonableness of the 
project duration and the contract time calculated using the detailed bottom-up methods at the end 
of the design stage. 


Some DOTs such as the Kentucky Transportation Cabinet, Indiana DOT, Ohio DOT, and Colorado 
DOT have developed a top-down project duration estimation tool based on the fact that there is 
strong correlation between project duration and key project characteristics such as project type, 
estimated cost, project location, and bid quantitates (Attal, 2010; KYTC, 2014; Taylor et al., 2017; 
Ohio DOT, 2020). Regression models were mostly used to establish the statistical relationship 
between key project parameters and project duration. In a survey, some DOTs reported that the 
regression method was more accurate and easier to use than their previous contract time estimation 
methods that are based upon production rates and generic precedence logic diagrams (Taylor et 
al., 2017). For example, Ohio DOT (2020) developed a regression model for each project type (in 
total, 19 types) using eight years of project data including project cost, project type, project 
location, and starting season to estimate project duration. Ohio DOT uses these models to estimate 
the duration of a project in early preconstruction stages by determining a mean duration with 90% 
and 95% confidence level. The agency uses such regression tools for preliminary estimation of 
contract time and uses production rate charts and scheduling tools for final setting of contract time. 


2.4 Current Leading Practices 


The CTD system of Texas DOT (TxDOT) was initially developed by Hancher et al. (1992) and 
further improved by Connor (2004) who developed a production rate estimation system. TxDOT 
categorized its highway projects into 13 project types. The CTD system asks the user to input work 
item quantities and the program finds a default production rate for each work item. The default 
production rates of the system can be adjusted for a particular project by the user. The system takes 
factors such as the location, traffic condition, complexity, soil conditions, and weather that affects 
the production rates and duration of work items. The system allows the user to determine the 
relationship of activities to finally generate a bar chart presenting the activities, relationships, and 
their duration (Hancher et al., 1992; Texas DOT, 2018). 


Kentucky Transportation Cabinet (KYTC) (2014) divided highway projects into small size (lower 
than $1M) and large size (higher than $1M) and developed unique regression models for ten 
project types of small size and five project types of large size. The small projects account for more 
than 90% of the KYTC highway projects. The regression models require project identification 
number, construction estimate, letting date, and the selected design project type as input variables. 
The model returns an estimated lower, mean, and upper range of completion dates and working 
days with 95% of confidence level. For large projects, the input variables for the model include 
the construction estimate and key bid item quantities (KY TC, 2014). 


A Microsoft Excel tool was developed to facilitate the estimation process as shown in Figure 2.1. 
Once the input variables are entered, the tool calculates and provides a mean, lower, and upper 
bound of the estimated project duration. 


ex [1 —] Rage ^ — | 
Lower Upper 
Mean Duration [Duration Duration 
(Days) (Days) (Days) 


n/a 


Figure 2.1 Screenshot of KYTC Excel tool (KyTC, 2014) 


Ohio DOT (2020) developed a unique regression model for each project type (total of 19 types) 
using eight years of data. The regression models take multiple project variables such as project 
cost, project type, project location, and starting season to estimate project duration with 90% and 
95% confidence interval. Ohio DOT applies such regression models for a preliminary estimation 
of contract time and uses production rates and scheduling tools for final setting of contract time 
that is illustrated in Figure 2.2 (Taylor et al., 2017). The project duration estimation tool is an 
Microsoft Excel tool that includes three steps. The first step is to input major work items for a new 
project then the tool automatically calculates the production rates and adjusts them with adjusting 
factors. The adjusting factors are determined based on factors such as the location of the project 
(rural/urban), traffic, project complexity, soil condition, and the size of the project. After 
production rates are calculated, the user inserts the major work items in the Bar Chart and identifies 
the work item relationships. Step 2 computes the overall duration of the project. Step 3 captures 
the total duration and the starting month and considers weekends, holidays, and weather days to 
adjust the project duration and estimate the completion date of the project (Ohio DOT, 2020). 
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Figure 2.2 Screenshot of Ohio DOT Excel Tool (Ohio DOT, 2020) 


Virginia DOT (Gondy & Hildreth, 2007) categorizes its highway projects into six types and uses 
production rates and sequence logics for CTD. As Figure 2.3 shows, for less complex projects, the 
sum of the durations for major work items accounts for the project duration. But, for more complex 
projects, a scheduling technique such as CPM or Bar Chart is used to develop a schedule for CTD 
after determining durations for major work items. 
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Figure 2.3 Screenshot of Virginia DOT CTD Tool (Gondy & Hildreth, 2007) 


Nevett et al. (2020) collected highway project data from Colorado DOT that included information 
about quantities of work items, item level costs, and durations of 15,000 projects. They analyzed 
22 variables and developed a multi linear regression model that can receive ten project 
characteristics such as project type, cost, traffic condition, and work item quantities to predict 
project duration. 


Jeong et al. (2008) developed a comprehensive automated scheduling system for Oklahoma DOT 
(OKDOT). The research team categorized OKDOT highway projects into three tiers based on 
complexity (tier I has the highest complexity) and developed the scheduling system for Tier II and 
III that account for more than 90% of OKDOT highway projects. They developed a standalone 
computer application and linked Microsoft Project to a database of project types and production 
rates in Microsoft Access. The application receives estimated quantities of controlling work items 
from the user and finds the associated production rates of the work items from the database and 
computes durations of activities that can be adjusted by the user (Figure 2.4). Then, the application 
exports the information to a Microsoft Project that includes the pre-established activity sequence 
logics for different project types to determine the relationship of activities using CPM and create 
a reliable schedule that can be used as a basis for contract time (Figure 2.5). 
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Figure 2.4 Screenshot of OKDOT CTD Tool (Jeong et al., 2008) 
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Figure 2.5 Total project duration and CPM diagram in Microsoft Project (Jeong et al., 2008) 


2.2 Discussion on the Leading Practices 


The top-down approach of project duration estimation is used as a quick and reliable approach in 
different DOTs for determining the contract time before the beginning of construction. DOTs 
developed different computer systems that take key project characteristics as the inputs and predict 
project duration as the output. To operationalize the models, different software programs were 
developed. A summary on the tools, methods, and project attributes is provided below. 


Project attributes: key project characteristics such as project location, type, size, 
estimated cost, traffic condition, soil condition, weather, as well as work item quantities 
were considered as input variables to predict the project duration. 


Software programs: Microsoft Excel is widely used as the main tool (Ohio DOT, 2020; 
Taylor et al., 2013; Texas DOT, 2018). Researchers used Visual Basic as a programming 
tool in Microsoft Excel to further improve the capability of project information analysis in 
DOTs (Jeong, Shane, et al., 2019). Other software programs such as Lotus, Flash-up, and 
Microsoft project were used to link the database to the main tool and present the bar chart 
schedule (Jeong et al., 2008; Texas DOT, 2018). Due to its simplicity, high capabilities, 
and availability of Microsoft Excel in DOTs, MS Excel will be used in this research as the 
main tool. 


Modeling approach: Most DOTS have used statistical methods such as regression models 
as the major modeling approach. Some DOT Ss used a stochastic approach to estimating a 
range of project durations within a certain confidence level (Ohio DOT, 2020; KYTC, 


2014; Taylor et al., 2013). The stochastic approach can provide a more realistic insight of 
project duration since there are many uncertainties associated with project variables. 


2.3 Influential Factors on Project Duration 


Recent studies identified key project characteristics as predictors of project duration estimation as 
shown in Table 2.1. Controlling work items are activities that are highly likely to be on the critical 
path of a project. Controlling work items and their quantities were identified as key influential 
factors by previous researchers. Project location is another contributing factor to project duration. 
For example, traffic congestion in urban areas may prolong the duration of an urban project. Also, 
difficulties of carrying materials to mountainous areas may also increase the project duration. 
Project work type is critical in project duration estimation since each project type has its specific 
activities and sequences. Project’s estimated cost is another factor determining the size of the 
project that is influential in project duration, since larger projects tend to have more expenses, 
larger material procurement, and more equipment pieces. Weather condition is highly influential. 
Cold weather can slow down the construction process especially in districts where the winter is 
long and severe. 


Table 2.1 Highway project attributes used for project duration estimation. 


No. Attribute References 
Ohio DOT (2020), Abdel-Raheem et al. (2020), Taylor et 
1 Project location al. (2013), Attal (2010), Hegazy and Ayed (1998), 
Hancher et al. (1992) 
2 Project size Nevett et al. (2020), KYTC (2014), Jeong et al. (2008) 
: Nevett et al. (2020), Ohio DOT (2020), KYTC (2014), 
oy SESEDUBOUODNE Wilmot and Mei (2005) 


Ohio DOT (2020), Mensah et al. (2016), KYTC (2014), 


Controlling Work iten Agi and Beldeki 3009). Jeong et al. (2008). 


quantities Hancher et al. (1992) 
5 Scope of work Attal (2010) 
6 Contract execution l 

date Ohio DOT (2020), Attal (2010) 
7 Design method Hoffman et al. (2007) 


Nevett et al. (2020), Ohio DOT (2020), Attal (2010), 
Jiang & Wu, (2004), Skitmore & Ng (2003) 


9 Population of the area Leu & Yang (1999) 


8 Project type 
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10 Number of lanes Mahmood et al. (2017), Williams & Heldreth (2009) 


gu Ohio DOT (2020), Nevett et al. (2020), Jiang & Wu, 
pp ehecondupT (2004), Hancher et al. (1992) 


12 Production rates Connor (2004), Jiang & Wu (2004) 
13 Weather conditions Ezeldin & Sharara (2006), Jiang & Wu (2004) 


Use of project characteristics in a project duration estimation model is highly dependent on the 
availability of data. In general, applying more project features results in a higher accuracy and 
reliability of the estimated duration. 


2.4 Review of Statistical Methods and Artificial Neural Networks (ANNs) 


Project duration estimation using key project characteristics requires a statistical model that can 
estimate the relationship between multiple numerical and categorical independent variables (i.e., 
project characteristics) and one numerical dependent variable (i.e., the project duration). In 
statistics, regression analysis is primarily used for such purposes. 


Regression models were mostly used to establish the statistical relationship between key project 
parameters and the project duration. Some DOTs reported that their regression method was more 
accurate and easier to use than their previous contract time estimation methods that were primarily 
based upon production rates and generic precedence logic (Taylor et al., 2017). Kentucky, Indiana, 
and Ohio recently developed single variate and multivariate regression models that use cost 
estimates and selected bid item quantities to estimate contract time (Jiang & Wu, 2004; Zhai et al., 
2016). Nevett et al. (2020) collected highway project data from Colorado DOT that included 
information about construction quantities, cost, and contract time of 15,000 projects. They 
analyzed 22 variables and developed a multi linear regression model that uses ten influential 
variables and predicts project duration. Ohio DOT (2020) developed a regression model for each 
project type (in total, 19 types) using eight years of project data including project cost, project 
type, project location, and starting season to estimate project duration. Ohio DOT uses these 
models to estimate the duration of a project in early preconstruction stages by determining a mean 
duration with 90% and 95% confidence level. The agency uses such regression tools for 
preliminary estimation of contract time and use production rate charts and scheduling tools for the 
final setting of contract time. Ohio DOT uses these models to estimate an early construction 
duration with 90% and 95% confidence level (Taylor et al., 2017). KYTC (2014) developed 
regression models for ten project types of small size and five project types of large size that receive 
project attributes such as work item quantities and project cost to predict project duration with a 
defined level of certainties. Nevett et al. (2020) analyzed 22 variables and developed a multi linear 
regression model that can receive ten project characteristics to predict the project duration for 
Colorado DOT. 
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Although regression analysis models have been used vastly in literature for project duration 
estimation, Artificial Neural Networks (ANNs) have been applied recently because of their 
capability of recognizing complex non-linear relationships between inputs and output (Attal 2010; 
Mensah et al., 2016; Petruseva et al., 2019; Cheng et al., 2019; Karaca et al., 2020; Alikhani et al., 
2020). Attal (2010) identified six key project characteristics in highway projects of Virginia DOT 
and used them as input variables and applied ANNS to predict project duration. They achieved the 
accuracy of 91% for ANNs in predicting contract time. They compared the accuracy of ANNs with 
regression analysis, which was 91% and 89% respectively, and concluded that the ANNs had a 
better performance in prediction. Al-saadi et al. (2017) used ANNs for predicting contract time 
and achieved the accuracy of 90% and compared the method with other techniques and concluded 
that the ANN worked more accurate than other methods. Cheng et al. (2019) used ANN to obtain 
schedule to completion of construction projects and achieved 99% of accuracy. Gransberg et al. 
(2017) conducted a research study for the MDT and used ANNs for early cost estimation of 
highway projects. They proposed a top-down estimating approach and embedded the ANN model 
into a Microsoft Excel tool to facilitate the usability of the model. They concluded that MDT could 
enhance the accuracy of cost estimation using the proposed ANN model compared with their 
existing method. 
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3 Data Collection and Preliminary Analysis 


This chapter discusses the characteristics of collected data and preliminary analysis results. 
Historical highway project bid data were obtained from MDT and analyzed to provide insights 
about MDT highway projects and create a database to develop project duration estimation models. 
This chapter provides an overview of highway project data of MDT by describing available data, 
major project factors, and statistics on project characteristics. 


3.1 Data Overview 


Historical bid data of 1,090 highway projects from 2008 to 2019 were collected from MDT. Data 
attributes include project numbers, location (urban/rural), bid price, bid duration, adjusted cost, 
charged days, work type, letting date, and bid item (work item) title and quantities. 


3.1.1 Project Work Type 


Highway projects are classified into different project work types based on project activities and 
work descriptions. Table 3.1 shows different project work types and the number of historical 
projects assigned to each work type. Overlays is the most common project type in MDT, followed 
by Reconstruction and grading, Safety, Seal & cover, and Bridge construction, rehab, and removal. 
The top five work types account for 76% of all highway projects in MDT. 


Table 3.1 Distribution of highway projects by project work type 


Row Project work type Frequency Percentage of total 
1 Overlays 263 24% 
2 Reconstruction, grading 178 16% 
3 Safety 168 15% 
4 Seal & cover 129 12% 
5 Bridge construction, rehab, and removal 86 8% 
6 Slides or slope stabilization 36 3% 
7 Signals 29 3% 
8 Guardrail 25 2% 
9 Microsurfacing 21 2% 
10 Miscellaneous 16 1% 
11 Rehab (minor grade & overlay) 17 2% 
12 Crack seal 17 2% 
13 Signing 13 1% 
14 Drainage 11 1% 
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15 Portland cement concrete pavement 8 1% 


16 Sidewalk 22 2% 
17 Environmental and wetland 6 1% 
18 Fencing 10 1% 
19 Bike and pedestrian 18 2% 
20 Buildings (scales, rest areas) 6 1% 
21 Rumble strips 6 1% 
22 Lighting 2 0% 
23 Scour projects 2 0% 
24 Warm mix bit surf l 0% 
Total count 1090 100% 


3.1.1 Work Items 


The frequency of bid items or work items in the database was analyzed to remove work items with 
a low frequency of occurrence to avoid complexity in model development. The research team 
already identified most significant controlling work items in MDT projects from a recently 
completed research project (Jeong & Alikhani, 2020). A controlling work item may consist of one 
or multiple work items and is defined as major activities associated with a relatively high amount 
of work quantity. Controlling work items influence the duration of a project and they are highly 
likely to fall on the critical path of the project schedule (Jeong & Alikhani, 2020). The detailed list 
of all controlling work items and their associated work items is available in Jeong & Alikhani 
(2020). In this research, controlling items that appeared in less than 50 projects from the entire 
database of 1,090 projects were eliminated in order to reduce the number of variables and increase 
the robustness of the model. Table 3.2 shows the frequency of controlling work items in the 
database after removing uncommon controlling items. Common controlling work items (occurred 
in more than 50 projects in the past) were used as input variables. 


Table 3.2 Frequency analysis of controlling work items that appeared in at least 50 projects 


Row Controlling Work Item Frequency % of projects 
1 Mobilization 1090 100% 
2 Traffic control 1086 100% 
3 Remove existing structures 868 80% 
+ Pavement marking 839 77% 
5 Emulsified asphalt 718 66% 
6 Cover 700 64% 
7 Signs 697 64% 
8 Temporary activities 596 55% 
9 Crushed aggregate course 500 46% 
10 Base preparations (soil stabilization) 460 42% 
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11 Guard rail 460 42% 


12 Milling and pulverizing 457 42% 
13 Asphalt cement 367 34% 
14 Plant mix surfacing 354 32% 
15 Seeding 340 31% 
16 Topsoil-salvaging and placing 321 29% 
17 Rumble strips 308 28% 
18 Commercial mix 272 25% 
19 Excavation-unclassified 272 25% 
20 Farm fence 270 25% 
21 Drainage pipe (<=24 in) 264 24% 
22 Bridge deck 246 23% 
23 Riprap 234 22% 
24 Special borrow 219 20% 
25 Sidewalk 181 17% 
26 Curb and gutter 176 16% 
27 Embankment in place 158 15% 
28 Drainage pipe (> 24 in) 155 14% 
29 Deck grooving (after curing) 125 12% 
30 Reinforcing steel 113 10% 
31 Bridge deck repair 103 9% 

32 Piling 93 8% 


3.1.2 Project Duration Performance 


Two attributes related to project duration exist in the database: project bid days and total charged 
days. Project bid days indicate the estimated project time before the beginning of the construction 
and the total charged days reflect the actual duration of the project. Figure 3.1 shows a histogram 
of projects by different ranges of charged days. 75% of projects took less than 100 working days, 
indicating that most MDT projects take less than a year to complete. 
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Figure 3.1 Frequency of projects with different charged days 


The project bid days are compared to the project charged days to evaluate the accuracy of project 
time estimation before construction. Figure 3.2 illustrates the average percentage of the difference 
between estimated duration and actual duration for different ranges of project bid time. The 
percentage is calculated using Equation 1. 


Charged days-Bid days 


k : 
Bid days 100 (Equation 1) 


Percentage of time difference = 
For example, according to Figure 3.1, 17% of projects took less than 20 working days. Figure 3.2 
shows the difference between estimated and actual project time for these projects is -23.7%, 
meaning such projects were finished 23.7% sooner than estimated. The percentage rises with the 
increase of project bid days. In projects that take 120 to 140 days, there is an average of 21.8% 


delay in finishing the project within estimated time. Figure 3.2 indicates that many projects have 
not been completed within the estimated time. 
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Figure 3.2 Average percentage of the difference between bid days and charged days 


3.1.3 Project Cost Performance 


Figure 3.3 shows a histogram of project cost in scale of million dollars. All historical project costs 
are adjusted to the base year of 2018 using the National Highway Construction Cost Index 
(NHCCI) in order to neutralize the impact of inflation and consider the time value of money. Jeong 
et al. (2017) identified multidimensional HCCIs for highway projects in MDT ranging from 2010 
to 2014, which are used in this study. For the rest of the years (2008-2010 and 2010 to 2016), the 
NHCCI of FHWA was used (FHWA, 2022). Table 3.3 illustrates the Cost Indexes for different 
years used in this study. 
Table 3.3 Cost Indexes for different years used in this study 


Year Cost Index Resource 

2008 115.06 FHWA (2020) 
2009 100.25 FHWA (2020) 
2010 100 Jeong et al. (2017) 
2011 110.46 Jeong et al. (2017) 
2012 111.12 Jeong et al. (2017) 
2013 113.06 Jeong et al. (2017) 
2014 115.46 Jeong et al. (2017) 
2015 118.01 FHWA (2020) 
2016 115.38 FHWA (2020) 
2017 116.34 FHWA (2020) 
2018 124.1 FHWA (2020) 
2019 133.45 FHWA (2020) 
2020 133.4 FHWA (2020) 
2021 139.66 FHWA (2020) 
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Shrestha et al. (2017) divided MDT projects in terms of dollar value into three ranges of small size 
(up to $3.5M), medium size ($3.5M to $10.5M), and large size ($10.5M to $50M). According to 
Figure 3.3, 67% of the projects took less than $3.5M, indicating that small size projects are very 
common in MDT. 
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Figure 3.3 Frequency of projects in terms of cost 


3.2 Preliminary Analysis on Correlations Between Variables 


Correlations between project characteristics were explored to identify the importance of variables 
in estimating the project duration. Pearson correlation coefficient is typically used to measure the 
strength of a linear association between two variables. The Pearson coefficient can take a value 
between -1 to +1, where the value of 0 indicates no correlation and the values of | and -1 show the 
highest correlation. Positive values indicate a positive relationship, whereas negative values 
indicate a negative relationship. A positive association means that when the value of one variable 
rises, so does the value of the other. A negative relationship means when one variable decreases, 
another increases. Table 3.4 illustrates the Pearson correlation coefficients for project features 
including project location, project starting year, and project cost, project type, and controlling work 
items with project time. According to the table, the project cost has the highest correlation with 
project time followed by controlling work items. No relationship was detected between project 
location and project time. Correlation between project type and project time is insignificant. Also, 
project starting year does not influence the project time. 
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Table 3.4 Pearson correlation coefficient between project features and project time 


Project feature Pearson Coefficient 
Location 0 

Starting year -0.035 

Project cost 0.83 

Controlling work items — 0.72 

Project type 0.07 


The results of the preliminary analysis on correlations between project features and project 
duration are used at the model development stage to pick the most correlated variables and 
eliminate insignificant variables. 
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4 AI Model Development 


This chapter describes the AI model development process for project duration estimation. To 
increase the model's performance, hyperparameter tuning, model regularization, and feature 
selection were used. The model was trained and tested by splitting the entire dataset into the 
training and testing datasets. A regression model was also developed for comparison with the AI 
model and as a companion to the AI model. The statistical metrics of the models as well as practical 
value of the models are described in the chapter. 


4. Regression Analysis with Artificial Neural Networks 


Artificial Neural Networks (ANNs) is an algorithm that learns the relationships between features 
and the target. In this research, feature variables including numeric variables (such as quantity of 
work item) and categorical variables are preliminarily used as input variables. The target is a 
numerical variable (project duration) as the output variable. To handle categorical variables, they 
are transformed to binary values. Figure 4.1 shows the architecture of the ANN model. In the input 
layer, the ANN model receives several numerical input variables and passes them through hidden 
layers, where weight matrices and non-linear functions are applied to the input values and added 
together to form a single number as the output value. 


Inputlayer Hidden layer/ layers Output 
(project time) 


Estimated cost 
Bid item #1 quantity 
Bid item #2 quantity 


Bid item #3 quantity 


Project location 


Figure 4.1 Architecture of the ANN model 


4.0 Database Overview 


Data attributes include six feature categories of project characteristics including five independent 
variables and one dependent variable. Independent variables include controlling work item 
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quantities (total of 32 controlling work items), project work type (total of 24 types), project 
location (urban/rural), starting year, and construction cost. Independent variables are used to 
predict the dependent variable of project duration. Independent variables include numerical (such 
as project cost) and categorical (such as project type) variables. Categorical variables are 
transferred to numerical variables using one-hot encoding technique to shape the input matrix with 
the size of 1,090*60 (total number of projects* total number of columns). Target matrix has the 
size of 1,090*1 (the bid days of all 1,090 projects). Columns of the input matrix correspond to 
project features depicted in Figure 4.2. Initial 32 variables correspond to work item quantities, the 
next 24 variables account for project types, the next two variables correspond to project location, 
followed by starting year and estimated construction cost. 


Work item quantities Project types Location Starting year Project cost 


—— n A SEA t 


Figure 4.2 Feature arrangement in the feature matrix 


4.3 Model Development, Training, and Testing 


The current dataset includes values of different types with different ranges. For example, estimated 
construction cost is a number reflecting the estimated monetary value, while work item quantity is 
a number indicating the size of a particular work item. To handle the variety of values, each column 
is normalized separately and scaled into the range of [0,1]. The whole dataset is divided into 80% 
for training and 20% for testing. To test the model, two metrics of R-squared and Mean Squared 
Error (MSE) are used. MSE measures the average squared difference between the estimated values 
and the actual values. R-squared is a goodness-of-fit measure for a regression model that quantifies 
the proportion of variation explained by an independent variable in a regression model for a 
dependent variable. It is preferable to have a lower MSE and a higher R-squared. The final ANN 
model includes one input layer, multiple hidden layers with activation function of Sigmoid, and 
one output layer. 


4.4 Comparison of the ANN Model with Regression Model 


A linear regression model was also developed with the same dataset to compare with the ANN 
model. Table 4.1 summarizes the comparison results of both models in terms of two metrics. Slight 
differences indicate the similar performance of both ANN and Linear Regression models in 
predicting project duration. 


21 


Table 4.1 Comparison of linear regression and ANN model using whole features 


Model MSE _ R-squared 


ANN 0.0023 0.82 
Linear Regression 0.0033 0.80 


4.5 Feature Selection 


In developing predictive models, feature selection is the process of selecting the most influential 
input variables. It is desired to limit the number of input variables in order to lower the 
computational cost of modeling and, in certain situations, increase the model's performance. To 
select important features, a feature score is calculated based on F-test. F-test is a statistical test 
which compares the importance of a model's improvement when new variables are included. 
Figure 4.3 illustrates the importance scores of all features. Refer to Figure 4.2 to figure out which 
feature corresponds to which project attribute. For example, feature # 59: estimated construction 
cost shows the highest importance in determining project duration. 
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Figure 4.3 Feature importance scores 


According to Figure 4.3, some features have a higher importance score than others. Feature 
selection includes selecting such important features and removing insignificant features. To 
identify the best number of features to select for the final model, different model configurations 
with a different number of features were developed and validated using k-fold cross validation 
with k=10 with the metric of MSE. Figure 4.4 shows the means of MSEs resulting from the k-fold 
cross validation for different numbers of features. The lowest MSE is obtained when the number 
of features is 26. Because of the randomness of partitioning the dataset into training and testing 
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datasets during the cross-validation procedure, the minimum value of MSE differs somewhat from 


the model's MSE. For the final model, the top 26 features according to their feature score were 
selected. 


0.012 


15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 
Number of features 


Figure 4.4 Mean of MSE in a 10-fold cross validation for different number of features 


Among the top 26 features, the first and most significant feature is the estimated construction cost, 
and the other features include 25 controlling work items. Table 4.2 shows 25 significant controlling 
work items whose quantities are highly influential on project duration. Pearson coefficient was 
used to determine the correlation between each controlling item and the project duration. Among 
all work items, the “Traffic Control” work item has the highest association with project duration 
with the Person coefficient of 0.63. The Traffic Control work item includes the pay items of 
TRAFFIC CONTROL DEVICES CB with the item code of 618030005 and 999618000, TRAFFIC 
CONTROL with the item code of 999618030 and 999618035), and TRAFFIC CONTROL-FIXED 
with the item code of 618030015 (MDT, 2021). In the database, the pay item of TRAFFIC 
CONTROL DEVICES CB accounts for 90% of traffic control related pay items, that is the number 
of devices used for maintaining the traffic during construction (MDT, 2020). 
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Table 4.2 Significant controlling work items with highest impact on project duration 


Correlation with time 


Row Controlling Work Items (Pearson Coefficient) 
1 Traffic control 0.63 
2 Drainage pipe (<= 24 in) 0.53 
3 Crushed aggregate course 0.52 
4 Excavation-unclassified 0.51 
5 Mobilization 0.50 
6 Seeding 0.50 
7 Special borrow 0.44 
8 Reinforcing steel 0.39 
9 Plant mix surfacing 0.38 
10 Asphalt cement 0.38 
11 Drainage pipe (> 24 in) 0.36 
12 Deck grooving (after curing) 0.36 
13 Riprap 0.35 
14 Piling 0.27 
15 Sidewalk 0.24 
16 Curb and gutter 0.24 
17 Guard rail 0.19 
18 Bridge deck 0.19 
19 Commercial mix 0.17 
20 Signs 0.16 
21 Embankment in place 0.14 
22 Emulsified asphalt 0.11 
23 Cover 0.11 
24 Rumble strips 0.10 
25 Milling and pulverizing 0.10 
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The results of feature selection are compatible with correlation analysis in the last chapter. 
Estimated construction cost was discovered as the most significant variable in both correlation 
analysis and feature selection analysis, while project location and starting year were identified as 
insignificant in influecint project duration. 


4.6 Final Results 


After feature selection and correlation analysis, the number of features reduced from 60 to 26. The 
ANN and linear regression models were developed based upon the selected features. Table 4.3 
summarizes the final MSE and R-squared of the models. In terms of project duration estimation, 
the result demonstrates that the ANN outperforms Regression by a little margin. Because the 
differences between the two models are minor, both will be incorporated in the Microsoft Excel 
tool, which will allow users to consider the estimates of both models. 


Table 4.3 Comparison of linear regression and ANN model using selected features 


Model MSE _ R-squared 


ANN 0.0022 0.72 
Linear Regression 0.0034 0.75 
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5. Tool Development 


This chapter describes the development process of a Microsoft Excel based Tool for 
implementation and provides a comprehensive guideline on using the tool. The tool helps MDT 
engineers to enter the quantities of major activities and the estimated construction cost to receive 
an early estimation of project duration using the two models developed in this research. Real 
examples are used to show how the tool works and how to obtain and interpret the results. 


5.1 Tool Development Process 


The ANN and the regression models developed in this research were transferred and embedded 
into a Microsoft Excel tool. Figure 5.1 represents the equation of the linear regression model. The 
vector of input variables incorporates the quantities of 25 major work items plus the estimated 
construction cost (vector size of 1*26). The unit impact of input variables is then eliminated by 
normalizing the input vector, converting values to numbers between 0 and 1 using Equation 5.1. 


(X—- Xmin) 


cx Equation 5.1 


Normalized = 


The normalized input vector is then multiplied by a matrix of coefficients (size of 26*1) and added 
to an intercept value to yield the output, which is the estimated project duration in this research. 
The coefficient values and the intercept value obtained from the regression model are transferred 
to an Microsoft Excel sheet in the tool (a hidden sheet) to interact with the input variables in the 
background to calculate the estimated project duration as the output. 


Input variables Coefficients 
Coeff. #1 ^ 
Coeff. #2 
[ var. #1 var. #2  ... var #26 | Na Coeff. #3 -+ Intercept — Output 
Coeff. #26_J 


Figure 5.1 Linear regression equation presentation 
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The ANN model includes an input layer (size of 1*26), three hidden layers with activation 
functions and an output layer (size of 1*1). The hidden layers include a weight matrix (similar to 
coefficients matrix in the regression model) and a bias matrix (similar to intercepts matrix in the 
regression model) and an activation function. The following describes how to obtain the network's 
output, which represents the estimated project duration. 


Output of layer 1: Relu [ normalized input vector (1*26) * W!6+500)  B(1*500j] 
Output of layer 2: Relu [ layerl output (1500) * W7(s00*100) + B^(1*100)] 
Output of layer 3: Sigmoid [ layer2_ output (1*100) * W?^(100*30) + B^(1*50)] 


Output of network: layer3 output (1*5) * W^(sosi) + Brat 


Where W is the weights and B is the bias matrix. The subscripts in parentheses indicate the size of 
matrixes and superscripts indicate the hidden layer numbers. Relu is a function that outputs the 
input directly if it is positive, otherwise, it outputs zero. Sigmoid function is obtained from 
Equation 5.2, which yields numbers between 0 and 1. 


TERES : 
Sigmoid = oe Equation 5.2 


The weights and bias matrixes as well as the activation functions are transferred to the Microsoft 
Excel tool (in hidden sheets) to execute interactions in the background and compute the network's 
output automatically given the input vector. 


5.2 Tool Overview 


The MS Excel Tool was named AI-PDET (Artificial Intelligence based Project Duration 
Estimation Tool). As Figure 5.2 shows, the AI-PDET includes three sheets; I) Home page, which 
provides a general guideline on using the tool. Figure 1.1 shows a screenshot of the home page. 
Clicking the “Start Analysis" button opens up the next worksheet. II) Project duration estimation 
work sheet, that takes input variables and predicts the duration of the project with two different 
methods of the ANN model and the regression model. III) sample projects sheet, which includes 
the information of three real cases to provide examples of using the tool. 
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ES B Return Table Data as JS.. EP Move Rows Between Ta.. EP Create, Sort, and Forma. tas 
Record New | [E] Generate a Sales Summ... EP @Mentions in a Status... E Count Empty Rows to a 
Actions Script |". Saipts 
Scripting Tools Office Scripts Flow Templates 
f * When the estimated construction cost is higher than $8M, the prediction accuracy may not be reliable due to the limited number of data points used in training 
B c D E [f G H | J K L M N fe} p Q R S 


Artificial Intelligence (M) Based Project Duration Estimation Tool 


ewe E 


[AI-PDET] 
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17 

18 

19 
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21 

22 

23 * This tool has been developed to estimate the probable duration of a construction project when there is limited project information available 

24 * To use this tool, you wil need to input the estimated construction cost of the project and the estimated quantities of major work items. 

25 * This tool has embedded Aritifical Neural Network (ANN) and Regression Models as computational engines to estimate the project duration in working days 
26 * This tool can be used during preconstruction stages to rapidly estimate the project duration 

27 * This tool can serve as a valuable back-check, enabling the assessment of reasonableness of a project's estimated duration calculated during the final design 
28 stage with detailed project information 


29 timated construction cost is higher than $8M, the prediction accuracy may not be reliable due to the limited number of data points used in training 
[9] 


Home | ContractTimetstimation | SampleProjects | 


Ready — jk Accessibility: Investigate 


Three main sheets 


t 


General guideline on the tool 


Figure 5.2 Screenshot of AI-PDET home page 


5.3 Project Duration Estimation 


Figure 5.3 illustrates the second page of the tool in which a user enters input variables in the input 
section and then, the output section shows the two estimated project durations predicted from ANN 
and regression models. The input section takes the estimated quantity of each of the 25 major work 


items and the estimated construction cost of the project in the yellow cells. Each major work item 


may include one or multiple pay items. To learn more about pay items and their associated major 
work items, refer to Appendix A of Jeong & Alikhani (2020). After the input section is completed, 
AI-PDET automatically presents the predicted project durations in the output section. 
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Figure 5.3 Screenshot of the AI-PDET Input and Output Page 


5.4 Sample projects 


The third page of the AI-PDET includes three cases of real highway projects in MDT (Figure 5.4). 
The first project is categorized as "Miscellaneous," the second as "Bike and pedestrian," and the 
third is a "Bridge construction" work type. All projects started in 2019. The actual charged days 
of each project are provided in the sheet to help the user to compare it with the predicted project 
durations with the AI-PDET. The quantities of major work items with their proper units are 
provided for each project. When the user takes the provided input values and enter them in the 
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second sheet of the AI-PDET, the user can see the predicted project durations on the right hand 
side and compare them with the real duration of the project. 


Project characteristics Case3 
Project number 9617133000 9149077000 8085164000 
Project ID IM 90-9(133)528 TA 41(77) 
, MISCELLANEOUS BIKE AND PEDESTRIAN BRIDGE 

Project type CONSTRUCTION,RE 
Project location Bighorn County Ravalli County Cascade County 
Project begin year 2019 
Engineers’ estimate ($) 2,682,663 519,287 | $ 5,980,982 
Project time (working days) 162 

Resource MDT website 
Activity quantities E eee 
Deck Grooving 0 0 
Embankment in Place 0 31.56 

Pilling 0 0 

Guardrail 0 0 

Curb and Gutter 0 0 
Commercial Mix 0 1006.23 
[Plant Mix Surfacing 0 0 
Mobilization 1 1 

Milling and Pulverizing 0 6322.2 
Crushed Aggregate Course 30.5 0 

Reinforcing Steel 0 30444 
Excavation-unclassified 507.9 0 

Cover 0 7441 

Riprap 0 0 

Traffic control 1 295704 
Bridge Deck 0 0 

Signs 0 360 

Seeding 0 0 

Asphalt Cement 0 0 

Drainage Pipe (« 24 in) 0 0 

Special Borrow 0 0 

Sidewalk 1231 0 

Drainage Pipe (» 24 in) 0 

Rumble Strips 0 

Emulsified Asphalt 0 

Home ContractTimeEstimation SampleProjects + 


Figure 5.3 Screenshot of the sample projects sheet of AI-PDET 


5.5 Results Interpretation and Model Limitations 


The AI-PDET is a top-down tool that can estimate a project’s duration when a limited amount of 
project information is available during the preconstruction stages. The estimated project duration 
using this tool is helpful in the early stages of the project delivery process for project programming 
and budgeting purposes. It can also be used to check the reasonableness of the project duration 
estimate derived from detailed project scheduling activities, once more detailed information on 
activity quantities, production rates, and activity sequencing become available in later design 
stages. 


30 


6. Maintenance of AI-PDET and Database Update 


This research used a data-driven approach to leverage historical data to predict the project duration 
of a new construction project in the early construction phases. As time goes on, new projects are 
carried out, containing new experiences. The data from recent projects can be added to the older 
ones to enrich the database. Data-driven prediction models can be trained on the updated database 
to obtain new knowledge from recent projects to enhance predictions in the future. Two data- 
driven models developed in this research include the ANN and the regression model trained on 
data from 2008 to 2019. 


This chapter explains how new project data can be used to update the models and the AI-PDET to 
make the tool more relevant to recent projects. In this research, coding was conducted in the 
Google Colab environment using Python language to create, train, and evaluate models. This 
chapter explains different coding parts and actions that need to be taken to transfer models’ 
attributes to AI-PDET. The input of the code includes the excel dataset of projects, and the code’s 
output is the attributes of the models, such as weights, biases, coefficients, and intercepts. The 
models’ attributes need to be transferred to the AI-PDET to finalize the updating process. 


6.1 Updating Process for AI-PDET 


There are three main steps to update AI-PDET (Figure 6.1). Step 1 involves updating the project's 
historical data and, if needed, updating the activity dictionary and the cost indexes. Step 2 includes 
running the developed code in this research that receives updated files of Step 1 to train the ANN 
and the regression models and output models' attributes. In Step 3, the models' attributes are 
transferred into the AI-PDET. 


Step 1 Step 2 Step 3 


Update historical data Run the provided code Update AI-PDET 
- Update the - Insert locations of - Unhide background sheets 


"Production Rate History.xlsx" updated Excel files of the AI-PDET 
- Modify the "Activities CostIndex.xlsx " - Run the code to receive - Transfer coding’s output to 
if needed models' attributes the AI-PDET 


Figure 6.1 Process of updating the AI-PDET 
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6.2 Step 1: Update Historical Data 


Two Excel files must be updated as inputs of the coding section. The first Excel file is the dataset 
of past projects. The original dataset obtained from MDT in this research is named 
"Production Rate History.xlsx", which is assumed as the main historical dataset (Figure 6.2). For 
the purpose of this research, it is required to update and use two sheets in this file called 
"Contract info" and "Item Info". Attributes used from the "Contract info" include project 
number, project cost, adjusted charge days, and project year. The "Item Info" sheet is used to 
obtain the information of work item quantities. 


AutoSave @orr A BD» -. 5b Raw, Production Rate History 


Home Insert Draw Page Layout Formulas Data Review View Developer Q] 


v 4 Calibri (Body) v 9 v AD eae © S [E] Conditional Form 
[n [yy % 


EZ Format as Table 


Paste EB 7] Uy Ow A v Alignment Number 
g% = Bs 


F Cell Styles v 
jx CONT ID 


B E D E F G H 


i] CONTID |7 Icont_pesc 7 | TOT BID. AM Y | NET C O AM 7 | BID D| * | TM, CHRG | * | CURRENT DAY * | LD. RAI 
8408 SLOPE FLTN-WIDEN-GALLATIN CANYON 12801152.93 232477895 210 AD 260 
SLOPE FLTN-WIDEN-GALLATIN CANYON 12801152.93 232477895 210 AD 260 
EVARO - MCCLURE ROAD 2898186518 — 166823227 320 CD 
BAINVILLE - E& W 17498166.91 181931400 240 AD 
POLY DRIVE SOUTH-BILLINGS 7718155.42 8872130 150 AD 
CANYON CREEK NORTH - BILLINGS 8478131.62 -8819.24 190 AD 
BAKER - SOUTH 8107444.85 18570612 135 AD 
LEWISTOWN - WEST 1256230327  -531679.64 275 AD 
CAPITOL INTERCHANGE - HELENA 2218218.00 8635.00 85 AD 
STRUCTURES - SE OF MANHATTAN 5548515.56 3526750 200 AD 
LAUREL - NORTHEAST 4589176.24 28262516 120 AD 
40 KM S OF EKALAKA - S (PH Ill) 11639238.83 94722.81 175 AD 
2002-BIG MUDDY CREEK-NO. OF BYNUM 2480682.64 -1223.00 120 AD 
D4 - INTERSTATE STRUCTURE REHAB 484453010 65476193 180 AD 
STRUCTURES - NE OF EKALAKA 2714672.95 0.00 150 AD 
STRUCTURES - NE OF EKALAKA 2714672.95 0.00 150 AD 
STMARYSRD - N &S 5039730.26 -52415.50 150 AD 
2000 - SAFETY IMPROVEMENT - HILGER 1585765.52 60750.26 110 AD 
SLIDE 4 MILES WEST OF BEARCREEK 75747254 -1466.90 50 AD 
8 KM S OF POLSON - SOUTH 813540213 58726717 180 AD 
INDIAN PRAIRIE LOOP-NORTH & SOUTH 4679970.50 0.00 130 AD 
JUNCTION 5-322 - SOUTH 6595967.64 5372.18 180 AD 
JCT S-284 - WEST 7147550.48 -53811.00 180 AD 
EAST THREE FORKS INTERCHANGE 3342557.35 63175550 160 AD 
7 KM EAST OF COLUMBUS - EAST 3450648.16 -4507.07 120 AD 
JCT MT 85 - EAST (EAST SECTION) 4476503.00 415547.70 90 AD 
RACETRACK-S OF WARM SPRINGS 3859296.94 95315.83 75 AD 
RACETRACK-S OF WARM SPRINGS 3859296.94 95315.83 75 AD 
HUFFINE LN - FOUR CORNERS TO 19TH 2043778.34 — 255986.00 50 AD 
HUFFINE LN - FOUR CORNERS TO 19TH 2043778.34 — 255986.00 50 AD 
JUDITH GAP 3790379.14 -24519.00 160 AD 
SHILOH ROAD CORRIDOR-BILLINGS 5199668.16  184869.07 115 AD 
ICT RANT ERGER TRAN . NE aaaaaa aa nnn &n An 


Contract Info Estimate Info Item Info Critical Date Info SQL Statement 


Figure 6.2 Screenshot of the *Production Rate History.xlsx" Excel file 


The second imported Excel file called *Activities CostIndex.xlsx", developed in this research, 
including the list of work items and their associated pay items and cost indexes. Figure 6.3 
illustrates a screenshot of this Excel file that includes two sheets. Two columns on the 
"Controlling" page list the "pay item description" and the related "controlling activities". Refer to 
the project's final report for further information on how pay items are merged and tied to activities. 
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This page can be modified by placing the name of the controlling activity in front of the pay items 
if more pay items need to be linked to controlling activities. The year and the year's cost index are 
listed on the "CostIdex" sheet. In this study, the ratio of indexes of different years is employed to 
adjust project cost to the base year of 2021 as the current year of research. For future uses, there is 
a need to add the cost indexes for upcoming years to this file. For more information about the cost 
indexes used in this research, refer to the final report. Both Excel files can get updated over time 
as more information on new projects and cost indexes becomes available. 


AutoSave @orr A eoe 


AutoSave Gor A B B 


Associated 
controlling 
activity 


Cost index 


Pay item title 
u of the year 


A B 


1 ITEM DESC | Y | Controlling Activity Costindex 
203 CSP 2400 MM X 2.77 MM CTD > ~ DRAINAGE PIPE (> 24 IN) 115.06 


204 CSP 2700 MM 2.77 MM DRAINAGE PIPE (> 24 IN) 100.25 


205 CSP 2700 MM X 2.77 MM CTD 
206 CSP 300 MM X 2.01 MM 

207 CSP 300 MM X 2.01 MM CTD 
208 CSP 3000 MM X 3.51 MM 
209 CSP 450 MM X 1.63 MM 

210 CSP 450 MM X 2.01 MM 

211 CSP 600 MM X 2.01 MM CTD 
212 CSP 84 IN 0.079-CTD 

213 CSP 900 MM X 2.01 MM CTD 
214 CSP 96 IN 0.109-CTD 

215 CSP ARCH 128 IN 0.138-CTD 
216 CSP IRR 1050 MM X 2.01 MM 
217 CSP IRR 300 MM X 2.01 MM 
218 CSP IRR 450 MM X 2.01 MM 
219 CSP IRR 600 MM X 1.63 MM 
220 CSP IRR 600 MM X 2.01 MM 


221 CSP SLOTTED 300 MM X 1.63 MM 


222 CSPA 1240 MM X 2.77 MM 


223 CSPA IR 1060 MM X 2.01 MM CTD 


224 CURB AND GUTTER-CONC 

225 CURB AND GUTTER-CONCRETE 
226 CURB BITUMINOUS 

227 CURB MARKING-WHITE EPOXY 


DRAINAGE PIPE (> 24 IN) 
DRAINAGE PIPE (<= 24 IN) 
DRAINAGE PIPE (<= 24 IN) 
DRAINAGE PIPE (> 24 IN) 
DRAINAGE PIPE (<= 24 IN) 
DRAINAGE PIPE (<= 24 IN) 
DRAINAGE PIPE (<= 24 IN) 
DRAINAGE PIPE (> 24 IN) 
DRAINAGE PIPE (> 24 IN) 
DRAINAGE PIPE (> 24 IN) 
DRAINAGE PIPE (<= 24 IN) 
DRAINAGE PIPE (> 24 IN) 
DRAINAGE PIPE (<= 24 IN) 
DRAINAGE PIPE (<= 24 IN) 
DRAINAGE PIPE (<= 24 IN) 
DRAINAGE PIPE (<= 24 IN) 


DRAINAGE PIPE (> 24 IN) 
DRAINAGE PIPE (> 24 IN) 
CURB AND GUTTER 
CURB AND GUTTER 


PAVEMENT MARKING 


100 
110.46 
111.12 
113.06 
115.46 
118.01| ] 
115.38 
116.34 

124.1 
133.45 
133.4 
139.66 


228 CURB MARKING-YELLOW EPOXY 
229 CURB MARKING-YELLOW EPOXY 
230 CURB MARKING-YELLOW EPOXY 
231 CURB STOP AND BOX-19 MM 
Controlling Costlndex oF Controlling CostIndex + 


PAVEMENT MARKING 
PAVEMENT MARKING 


Figure 6.3 Screenshot of two sheets in the “Activities CostIndex.xlsx" 


6.3 Step 2: Run the Provided Code 


The developed code in this research takes the MS Excel files of the last step to train and evaluate 
the ANN and regression models and produce the models' attributes required to update the AI- 
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PDET. The code consists of six parts, five of which use input data to train and test models, while 
the sixth part outputs the properties of the models in MS Excel format. The following explains 
each section of the code and discusses how to update the AI-PDET using the coding outputs. 


6.3.1 Part 1: Import Required Python Libraries and Read Raw Data 


Figure 6.4 shows the first part of the code. Required python libraries are imported to use the 
features of pre-defined functions to facilitate the computation, model development, and evaluation. 
Two updated excel files from the last step are imported. The location of the files has to be changed 
in order to allow the code to read the files from the local computer or a virtual drive. 


vrs wo p) ew: 
o #Partl: Import required python libraries, read raw data, and import basic information 

#libraries 

import pandas as pd 


import numpy as np 5x 

from sklearn.preprocessing import StandardScaler 

import matplotlib.pyplot as plt 

from pandas import read csv 

from keras.models import Sequential 

from keras.layers import Dense 

from keras.wrappers.scikit learn import KerasRegressor d : 

from sklearn.model selection import cross val score > Import libraries 

from sklearn.model_selection import KFold 

from sklearn.preprocessing import MinMaxScaler 

from sklearn.model_selection import train_test_split 

from keras import activations Excel file locations 
from sklearn import metrics 

from sklearn.linear_model import LinearRegression has to be changed 
from sklearn.metrics import mean_squared error Pi 


Import Excel files 


#read data from the "Prod 


#read data from the "Activities CostIndex.xlsx" excel file, which is fixed information about pay items and thier associated activities and cost indexes t 
df_ctrl=pd.read_excel('/content/drive/Shareddrives/MDT AI for Contract Time/Rawdata/Activities_CostIndex.xlsx',sheet_name='Controlling') 
df costindx- pd.read excel('/content/drive/Shareddrives/MDT AI for Contract Time/Rawdata/Activities CostIndex.xlsx',sheet name-'CostIndex') 


Figure 6.4 Screenshot of Part 1 of the code 


6.3.2 Part 2: Define Activities, Project Numbers, Project Quantities, Project Costs, and 
Project Times 


Part 2 of the code obtains information from the imported files (Figure 6.5). First, a list of activities 
is defined, including the name of 25 significant activities and their associated pay items obtained 
through the feature selection process in Task 3. Then, project information is obtained from the 
imported Excel files, including: 


e Project number: extracted from *PRJ NBR” column of the Contract Info" sheet from the 
"Production Rate History.xlsx" Excel file. 

e Project quantities: for each project, the quantity of each pay item is available in "Item Info" 
sheet of the “Production Rate History.xlsx" Excel file. The pay item quantities are used 
to compute the quantity of the 25 important activities for each project. 

e Project year: extracted from "WORK BEG” column of the Contract Info" sheet. 
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e Project cost: the cost of each project is the summation of “TOT BID AMT" and 
"NET C O AMT” of each project available in the *Contract Info" sheet. The costs are 
then adjusted using the cost indexes imported from the “Activities CostIndex.xlsx" Excel 
file. 

e Project time: if the "ADJUST CHARGE DAYS" column of the Contract Info" sheet 
has a value, it is considered as the project time. Otherwise, the “CHARGE DAYS" column 
is used as equal to project time. 


o #Part 2: define activities, project numbers, project quantities, project costs, and project times 
#define list of activities 
ActivitiesNew-['DECK GROOVING (after curing)','EMBANKMENT IN PLACE','PILING','GUARD RAIL','CURB AND GUTTER','COMMERCIAL MIX', 
'PLANT MIX SURFACING','MOBILIZATION','MILLING AND PULVERIZING','CRUSHED AGGREGATE COURSE','REINFORCING STEEL', 
'EXCAVATION-UNCLASSIFIED','COVER','RIPRAP','TRAFFIC CONTROL','BRIDGE DECK', 'SIGNS', 'SEEDING', ASPHALT CEMENT', 
"DRAINAGE PIPE (<= 24 IN)','SPECIAL BORROW','SIDEWALK','DRAINAGE PIPE (> 24 IN)','RUMBLE STRIPS','EMULSIFIED ASPHALT'] 


#define a dictionary of pay items and their activities r y 

controlling-list(set(df ctrl['Controlling'])) Extract project : 

payitem actvty-() , information from the Define list of 25 

for i in range(len(df ctrl['ITEM DESC'])): . $ E 
if df_ctrl.iloc[i,1] in ActivitiesNew: imported Excel files important activities 


payitem actvty[df ctrl.iloc[i,0]]-[df ctrl.iloc[i,1]] 


fdefine project number list 
proj nbr-list(set(datal['PRJ NBR'])) 
print('Number of projects:',len(proj nbr) ) 


#find the quantities of each item for each project 
projquant-() 


for j in proj nbr: 
projquant[j] = [0] * len(ActivitiesNew) 
for i in range(len(data2['PRJ NBR'])): 
if data2['ITEM DESC'][i] in payitem actvty.keys(): 


fupdate values of quantities for each project 
projquant[data2['PRJ NBR'][i]][ActivitiesNew.index(payitem actvty[data2['ITEM DESC'][i]][0])] += data2['QTY PD TO DT'][i] 


#find project starting year 
proj_year={} 
for i in range(len(datal['PRJ NBR'])): 
proj year[datal['PRJ NBR'][i]]sestr(datal['WORK BEG'][i])[0:4] 


#find project costs and adjust them to the base year of 2021 
#project cost is the "total bid amount" plus "net-c-o-amt" amount 
proj cost-() 
for i in range(len(datal['PRJ NBR'])): 

proj cost[datal['PRJ NBR'][i]]-datal['TOT BID AMT'][i]*datal['NET C O AMT'][i] 
proj cost adj-() 
for i in proj cost.keys(): 

if int(proj year[i]) in list(df costindx['Year']): 

proj cost adj[i]-proj cost[i]*int((df costindx.loc[df costindx.Year--2021, CostIndex']))/int((df costindx.loc[df costindx.Year--i 


#find project time that is ‘adjusted charge days' if available, otherwise is ‘charge days 
proj time-() 
for i in range(len(datal['PRJ NBR'])): 
if np.isnan(datal['ADJUST CHARGE DAYS'][i]): 
proj time[datal['PRJ NBR'][i]]2datal['CHARGE DAYS'][i] 
else: 
proj time[datal['PRJ NBR'][i]]edatal['ADJUST CHARGE DAYS'][i] 


Figure 6.5 Screenshot of Part 2 of the code 
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6.3.3 Part 3: Create the Dataset, Normalize Data, and Split the Dataset into Train and 
Test 


Part 3 of the code stacks project quantities and project adjusted costs to be used as the input matrix 
(X) and saves the project durations in a vector as the output (y). The input and output are then 
normalized to have values between 0 and 1 (X_normalized and y_normalized). Then, normalized 
values are split to training (80%) and testing (20%) datasets (Figure 6). 


[^] #Part 3: create the dataset, normalize data, and split dataset into train and test 
#create the dataset 
#define flattening function 
def flatten(t): TN 
return [item for sublist in t for item in sublist] 
| 


#create the dateset 
X=[] 
for i in range(len(proj nbr)): 
x1=[] 
xl.append(projquant[proj nbr[i]]) > Create the dataset 
xl.append([proj cost adj[proj nbr[i]]]) 
X.append(flatten(x1) ) 


yz] 
for i in range(len(proj nbr)): 
vip] 
yl.append(proj time[proj nbr[i]]) 
y-append(yl) r 
RE N 
##normalization 


#normalize X 
scaler = MinMaxScaler() 


scaler.fit(X) 

X normalized = scaler.transform(X) > Normalization 

#normalize y Split into 
scaler = MinMaxScaler() 


scaler. fit(y) train and test 


y_normalized=scaler.transform(y) 


zi 
#split the dataset into 80% train and 20% test t 
X train, X test, y train, y test = train test split(X normalized, y normalized, test size = 0.2, random state = 0) 
y train-np.array(y train) 
y test-np.array(y test) 


Figure 6.6 Screenshot of Part 3 of the code 


6.3.4 Part 4: Develop a new ANN Model 


Part 4 of the code defines the ANN layers, activation functions, training epochs, and trains the 
model using the training dataset, then computes the model's MSE on the test dataset (Figure 6.7). 
While the number of layers and activation functions cannot be modified to keep the AI-PDET 
consistent, some hyperparameters can be changed, such as the number of training epochs and the 
method of optimization. 
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#Part 4: develop the ANN model and evaluate id 
fdefine the ANN layers, hyperparameters, and training 
model - Sequential() 


model.add(Dense(500, input dim-len(X normalized[0]), activation- "relu")) 

model.add(Dense(100, activation- "relu")) ANN model 
model.add(Dense(50, activation- "sigmoid")) development 
model.add(Dense(1)) 


model.compile(loss- "mean squared error" , optimizer-"adam", metrics-["mean squáred error"]) 
model.fit(X train, y train, epochs-200) 


fevaluate the model 
pred = model.predict(X test) 


Model 


#measure MSE error. . 
Score - metrics.mean squared error(pred,y test) evaluation 
print("Final score (MSE): {}".format(score) ) 


Figure 6.7 Screenshot of Part 4 of the code 


6.3.5 Part 5: Develop a new Regression Model 


Part 5 ofthe code executes the regression model and exports the coefficients and the intercept. The 
evaluation measurements, such as the regression score and the MSE are reported (Figure 6.8). 


[5 


#Part 5: develop the regression model and evaluate it 
regr - LinearRegression() 

regr.fit(X train, y train) 

print('regression score is:',regr.score(X test, y test)) 


y pred-regr.predict(X test) 
print('regression MSE is',mean squared error(y test, y pred, multioutput-'raw values'))| 


regression score is: 0.7152690126304917 
regression MSE is [0.00398994] 
regression bias is: [0.05326561] 


Figure 6.8 Screenshot of Part 5 of the code 


6.4 Step 3: update the AI-PDET 


In step three, the attributes of the models are extracted into Excel files using the code, and the files 
are then transferred into the AI-PDET. 


6.4.1 Part 6: Extract the Attributes of the new ANN and Regression Models 


Part 6 of the code extracts both the ANN and the regression attributes and saves them into Excel 
files (Figure 6.9). A matrix of weights and biases and an activation function are available for each 
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layer of the ANN model. Running this part of the code will automatically save weights matrices 
(Wo to W3) and biases matrices (Bo to Bi). Additionally, it produces the minimum and maximum 
values of the input and output variables that are necessary for the normalization process. The 
coefficients of the regression model are exported, and the regression intercept is reported. 


Q #Part 6: extract the ANN's model weights and biases into excel files 

c=[] 

w-[] 

for layer in model.layers: 
c.append(layer.get config()) 
w.append(layer.get weights()) 

#extract weights and biases 

pa. Daren ene UU LOLs excel (wO: xex) ANN model ‘s 

pd.DataFrame(w[0][1]).to excel('b0.xlsx') : 

pd.DataFrame(w[1][0]).to excel('wl.xlsx') attributes 

pd.DataFrame(w[1][1]).to excel('bl.xlsx') 

pd.DataFrame(w[2][0]).to excel('w2.xlsx') 

pd.DataFrame(w[2][1]).to excel('b2.xlsx') 

pd.DataFrame(w[3][0]).to excel('w3.xlsx') 

pd.DataFrame(w[3][1]).to excel('b3.xlsx') 

#exctract min and max to reverse normalization HEPPÉ 

minl-np.min(X,axis-0) Normalization 

maxl-np.max(X,axis-0) vectors 

pd. Datarremo (MinT) to whee (minx zier) Regression 

pd.DataFrame(maxl).to excel( 'maxx.xlsx') 

print('minimum of y:', np.min(y)) attributes 

print('maximum of y:', np.max(y)) 

#extract coefficients and intercepts of the regression into excel files ji 

pd.DataFrame(regr.coef ).to excel('regressioncoeff.xlsx') 

print('regression bias is:',regr.intercept ) 


(> minimum of y: 1.0 
maximum of y: 549.0 


regression bias is: [0.05326561] 


Figure 6.9 Screenshot of Part 6 of the Code 


6.5 Transfer Matrices to AI-PDET 


The AI-PDET includes hidden sheets with the same name as the models’ attribute matrices. Three 
steps are needed to move the matrices to the proper location in the AI-PDET. 


D Step 1: unhide sheets in the AI-PDET; To unhide a sheet, right click on the home page, 
then select *unhide", then select the matrix you want to transfer (Figure 6.10.a). 

II) Step 2: transfer the ANN models’ weights and biases from the code's output files to 
the AI-PDET (Figure 6.10.b). 

III) Step 3: transfer the regression model’s attributes and the normalization values to the 
AI-PDET (Figure 10.c). 
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c) Step 3: transfer the regression attributes and the normalization values to the AI-PDET 


Figure 6.10 Steps to transfer values from the code to the AI-PDET 
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6.6 Summary 


This chapter explained how to update the AI-PDET using new project data. The coding was fully 
explained to make users understand how the code works and how to use it to take new data and 
update the database and the key attributes of the prediction models. To simplify the process, the 
code generates model attributes in MS Excel format. The updated AI-PDET would help users to 
obtain project duration predictions that are more relevant to recent projects. 
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